Worldwide Fast File Replication on Grid Datafarm

نویسندگان

  • Osamu Tatebe
  • Satoshi Sekiguchi
  • Youhei Morita
  • Satoshi Matsuoka
  • Noriyuki Soda
چکیده

The Grid Datafarm architecture is designed for global petascale data-intensive computing. It provides a global parallel filesystem with online petascale storage, scalable I/O bandwidth, and scalable parallel processing, and it can exploit local I/O in a grid of clusters with tens of thousands of nodes. One of features is that it manages file replicas in filesystem metadata for fault tolerance and load balancing. This paper discusses and evaluates several techniques to support long-distance fast file replication. The Grid Datafarm manages a ranked group of files as a Gfarm file, each file, called a Gfarm file fragment, being stored on a filesystem node, or replicated on several filesystem nodes. Each Gfarm file fragment is replicated independently and in parallel using rate-controlled HighSpeed TCP with network striping. On a US-Japan testbed with 10,000 km distance, we achieve 419 Mbps using 2 nodes on each side, and 741 Mbps using 4 nodes out of 893 Mbps with two transpacific networks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Gfarm V2: a Grid File System That Supports High-performance Distributed and Parallel Data Computing

Grid Datafarm architecture is designed for facilitating reliable file sharing and high-performance distributed and parallel data computing in a Grid across administrative domains by providing a global virtual file system. Gfarm v2 is an attempt to implement a global virtual file system that supports a complete set of standard POSIX APIs, while still retaining the parallel and distributed data c...

متن کامل

Optimization of Docking Conformations Using Grid Datafarm

Grid Datafarm (GFarm) is a Japanese national project that aims to design an infrastructure for global petascale data intensive computing. GFarm tools and APIs are provided to handle large data files in both single filesystem image and local file views. While the Grid Datafarm is originally motivated by high energy physics applications, it is a generic distributed I/O management and scheduling i...

متن کامل

Building A High Performance Parallel File System Using Grid Datafarm and ROOT I/O

Sheer amount of petabyte scale data foreseen in the LHC experiments require a careful consideration of the persistency design and the system design in the world-wide distributed computing. Event parallelism of the HENP data analysis enables us to take maximum advantage of the high performance cluster computing and networking when we keep the parallelism both in the data processing phase, in the...

متن کامل

Improving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy

Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.PF/0306090  شماره 

صفحات  -

تاریخ انتشار 2003